Conversation
Yep, seems something is not quite right yet. |
| **cfg["media_proc_cfg"], | ||
| } | ||
| # merge configs | ||
| self.preprocessor_config = {**self.preprocessor_config, **cfg} |
There was a problem hiding this comment.
self.preprocessor_config is empty at this point, so not really necessary to merge, but will allow it for consistent looks.
Add new kimi-k2.5 keys to mtmd convert Update V_MMPROJ tensor mapping for new mm_projector.proj keys Update V_M_IMP_NORM for new mm_projector.pre_norm key
|
Vision is working now for images, uploaded MMPROJ files to my repo. @ngxson I left comments about the places that confused me the most.
I think the rest of the changes are pretty sane. |
|
Great work AesSedai! I just downloaded the BF16 for mmproj. Is there any reason to get anything higher than Q8_0? What about ctk/ctv is there any good reason to run them in f16 instead of lower since the model is INT4? |
|
@segmond Thanks, for the MMPROJ some cards are more or less compatible with different versions. The BF16's don't work very well on my 3090s IIRC. The Q8_0 should be fine to use quality-wise. Regarding CTK / CTV, you do not want to quantize the cache on this model at all. The model weight quantization is different than the cache quantization. MLA / GQA already comes with some pretty severe compression on the cache so by further quantizing it you'll degrade it more. Besides, the context is very lightweight anyways. Something like 165k context in FP16 is like ballpark 10GB or so. |
|
@tempgidam Huh, interesting. I haven't seen any double-vision issues on my setup. I did update the first shard yesterday evening to adjust the chat template since MoonshotAI updated the media token. There's a minimal prompt on my ST setup ("A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions."). There was also a fix for the thinking block issue in the chat template that required an update a couple of days ago. Could you download a fresh copy of the first shard of the quant and re-test? I'd be surprised if there wasn't a bug lurking in there somewhere, which is why I'm waiting on ngxson to review. |
|
I used AesSedai's model (IQ2_XXS + Q8_0 mmproj) and tempgidam's image of a circle. Then prompted this:
And it generated me this: <svg viewBox="0 0 400 300" xmlns="http://www.w3.org/2000/svg">
<circle cx="100" cy="150" r="80" fill="none" stroke="black" stroke-width="15"/>
<circle cx="320" cy="150" r="80" fill="none" stroke="black" stroke-width="15"/>
</svg>I had to disable the thinking, because with thinking enabled it just rambled on and on for 3000 tokens, endlessly correcting itself, and then gave me an unusable SVG in the end. However, in its ramblings it said this:
...which doesn't correspond to the image above, but still mentions that one of the circles is cut off. Only in the image, the right circle is cut off slightly. So maybe there is something to it, or maybe it's just a hallucination because the quant is too small. The whole thinking process of the failed run (which was NOT used to generated the above image)The user wants me to create an approximate SVG file of the image shown. The image displays two black outlined circles (or ring shapes) on a white background. They appear to be positioned side by side, with the left one partially cut off on the left side and the right one fully visible or partially cut off on the right. Actually, looking more carefully, it seems like two circles with thick black strokes, positioned horizontally next to each other. Let me analyze the image:
To recreate this in SVG:
Looking at the proportions:
Actually, looking again, it seems the left circle is positioned so we see mostly its right side, and the right circle is positioned so we see its left side? No, wait. Let me reconsider. The image shows:
Wait, looking carefully at the original image description and the visual content:
Actually, looking at the image again, it seems like:
But wait, the left shape looks like it could be a full circle with the left side cut off by the image edge, and the right shape is a full circle. Let me create an SVG with:
Actually, looking at the aspect ratio, the image is wider than it is tall? No, the provided image seems to have a certain aspect ratio. Let me assume the viewBox is something like 0 0 200 100. Better approximation:
Looking at the image once more:
Actually, looking at the bottom alignment:
Let me go with:
But to make it look like the image where the left one is cut off:
Wait, looking at the image provided, the left circle appears to be positioned so we see the right side of it (like a "D" shape), and the right circle is a full "O" shape. Actually, no - the left one looks like it might be a full circle with the left side cut off by the frame. Let me create an SVG that captures the essence:
<svg width="200" height="100" viewBox="0 0 200 100" xmlns="http://www.w3.org/2000/svg">
<!-- Left circle, partially cut off -->
<circle cx="40" cy="50" r="35" fill="none" stroke="black" stroke-width="8" />
<!-- Right circle -->
<circle cx="140" cy="50" r="35" fill="none" stroke="black" stroke-width="8" />
</svg>But looking at the image, the left circle might be positioned at around x=30 (if width is 100), and the right at x=70. Actually, the image looks like it might be showing:
Let me adjust:
Stroke width should be roughly 1/5 to 1/4 of the radius? Looking at the image, the stroke is quite thick. If radius is 40, stroke might be 8-10. Let me provide a responsive SVG that approximates the visual: <svg viewBox="0 0 160 100" xmlns="http://www.w3.org/2000/svg">
<!-- Left circle (partially visible) -->
<circle cx="35" cy="50" r="30" fill="none" stroke="black" stroke-width="8" />
<!-- Right circle -->
<circle cx="115" cy="50" r="30" fill="none" stroke="black" stroke-width="8" />
</svg>This should create two thick-outlined circles, with the left one partially cut off on the left side if the viewBox starts at 0 and the circle extends to x=5 (35-30). Actually, looking at the image one more time, I think the left circle is positioned such that its left side is cut off, showing a "C" shape (or rather, the right side of a circle), and the right circle is fully visible. The vertical position seems roughly centered. Let me refine:
Wait, the spacing between them looks like roughly one radius or so. Final check:
So my SVG should be: <svg width="300" height="150" viewBox="0 0 300 150" xmlns="http://www.w3.org/2000/svg">
<circle cx="75" cy="75" r="60" fill="none" stroke="black" stroke-width="12" />
<circle cx="225" cy="75" r="60" fill="none" stroke="black" stroke-width="12" />
</svg>But with the left one positioned so its left side is cut off. So maybe cx="60" so left edge is at 0. Actually, looking at the original image, the left circle seems to be positioned with roughly 1/4 to 1/3 of it visible? No, it looks like more than half is visible. Maybe 2/3 or 3/4 of the circle is visible on the left. Let me go with:
Wait, looking at the provided image crops:
So the left circle is indeed partially cut off on the left side of the image. I'll provide an SVG with viewBox="0 0 200alpXD circle at (50,50) with r=40 and another at (150,50) with r=40, stroke-width around 8. Actually, to make the left one cut off: cx=40, r=40 (so x from 0 to 80). Final SVG: <svg viewBox="0 0 200 100" xmlns="http://www.w3.org/2000/svg">
<circle cx="···" cy="50" r="40" fill="none" stroke="black" stroke-width="8" />
<circle cx="160" cy="50" r="40" fill="none" stroke="black" stroke-width="8" />
</svg>With cx="40" for the left one (so it extends from 0 to 80), and cx="160" for the right one (80 to 160? No, 120 to 200). Wait, if cx=160 and r=40, it goes from 120 to 200. Spacing: from 80 (right edge of left) to 120 (left edge of right) is 40 units gap. That seems like a big gap. In the image, the gap looks smaller than the radius. Maybe:
This looks better. Or simply: <svg viewBox="0 0 180 100" xmlns="http://www.w3.org/2000/svg">
<circle cx···" cy="50" r="35" fill="none" stroke="black" stroke-width="8"/>
<circle cx="130" cy="50" r="35" fill="none" stroke="black" stroke-width="8"/>
</svg>Left circle at 45: range 10-80 Actually, looking at the image, the gap seems to be about one stroke-width or so, or maybe 1/4 of the diameter. I'll provide a clean, simple approximation that captures the essence. <svg viewBox="0 0 200 operativeCircle cx="45" cy="50" r="35" fill="none" stroke="black" stroke-width="8"/>
<circle cx="140" cy="50" r="35" fill="none" stroke="black" stroke-width="8"/>
</svg> |
|
Okay, I replaced the first shard with the updated one (and removed the --chat-template-file parameter which was loading the modified template from the hf discussion thread to fix the thinking tags with the old version). Sadly, this doesn't seem to change anything for me. I'm getting the same results across the board. I also tried some other things: However, nothing changed. The model still sees artifacts and seams across all of them. |
|
Thanks both for the feedback, I'll continue to tinker with this and see if I can figure out the issue :) |
|
@tempgidam / @z80maniac I've got something that might be worth testing on your setups. In the following file: Try swapping the numbers I might have goofed that permute and it's swapping w/h incorrectly before the |
|
@AesSedai Yes, with this patch my test creates only one circle: <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 200 200">
<circle cx="100" cy="100" r="80" fill="none" stroke="black" stroke-width="16"/>
</svg>And the This SVG is almost exact representation of the original picture: And in the reasoning it doesn't say that there are two circles:
|
|
@AesSedai Thanks for your work! |
|
Thanks for validating, I've pushed that change to the branch now. Much appreciated! |














Adding support for https://huggingface.co/moonshotai/Kimi-K2.5
Since this model includes compressed-tensors (INT4 for the conditional experts), I moved the
dequant_modelto theprepare_tensorscall at @compilade's suggestion. The model conversion fails otherwise because thequantization_configis nested under thetext_configin the config.json.Additionally, this model adds some new keys for the vision tower, prefixed as
vt_, and the preprocessor_config.json has the expected fields nested in themedia_proc_cfgkey.This PR does not include the "hacked" Q4_0 changes by @jukofyork, referred to in this comment.
I have added a first pass at vision support, heavily aided by LLM assistance. I entirely expect @ngxson to tear it to shreds or call me a dummy and show me an easier way to add that vision support :)